Clustered Modulo Scheduling in a VLIW Architecture with Distributed Cache
نویسندگان
چکیده
Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. In this work we propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2and 4-cluster configurations and for differing number and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.
منابع مشابه
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unro...
متن کاملPartitioned Schedules for Clustered VLIW Architectures
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine. A distinctive characteristic of this architecture is the use of register files organized by means of queues, which results in a number of advantages over conventional schemes, but also requires the development of sp...
متن کاملThesis - Vasileios Porpodas
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...
متن کاملSmart Memory Management through Locality Analysis
Cache memories were incorporated in microprocessors in the early times and represent the most common solution to deal with the gap between processor and memory speeds. However, many studies point out that the cache storage capacity is wasted many times, which means a direct impact in processor performance. Although a cache is designed to exploit different types of locality, all memory reference...
متن کاملOn the Effectiveness of the Scheduling Algorithm of the Dynamically Trace Scheduled VLIW Architecture
In a machine that follows the dynamically trace scheduled VLIW (DTSVLIW) architecture, VLIW instructions are built dynamically through a scheduling algorithm that can be implemented in hardware. These VLIW instructions are cached so that the machine can spend most of its time executing VLIW instructions without sacrificing any binary compatibility. This paper evaluates the effectiveness of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 3 شماره
صفحات -
تاریخ انتشار 2001